german credit data
Over the Returned Counterfactuals
In this appendix, we discuss a technique to optimize over the counterfactuals found by counterfactual explanation methods, such as [6]. We restate lemma 3.1 and provide a proof. Lemma 3.1 Assuming the counterfactual algorithm A (x) follows the form of the objective in equation 1, @@xcf G(x,A (x)) = 0, and m is the number of parameters in the model, we can write the derivative of counterfactual algorithm A with respect to model parameters as the Jacobian, @ @ A (x)= @2G(x,A (x)) @x2cf 1 G(x,xcf) (7) This problem is identical to a well-studied class of bi-level optimization problems in deep learning. In these problems, we must compute the derivative of a function with respect to some parameter (here) that includes an inner argmin, which itself depends on the parameter. We follow [44] to complete the proof.
Gradient-Optimized Fuzzy Classifier: A Benchmark Study Against State-of-the-Art Models
Sieverding, Magnus, Steffen, Nathan, Cohen, Kelly
This paper presents a performance benchmarking study of a Gradient-Optimized Fuzzy Inference System (GF) classifier against several state-of-the-art machine learning models, including Random Forest, XGBoost, Logistic Regression, Support Vector Machines, and Neural Networks. The evaluation was conducted across five datasets from the UCI Machine Learning Repository, each chosen for their diversity in input types, class distributions, and classification complexity. Unlike traditional Fuzzy Inference Systems that rely on derivative-free optimization methods, the GF leverages gradient descent to significantly improving training efficiency and predictive performance. Results demonstrate that the GF model achieved competitive, and in several cases superior, classification accuracy while maintaining high precision and exceptionally low training times. In particular, the GF exhibited strong consistency across folds and datasets, underscoring its robustness in handling noisy data and variable feature sets. These findings support the potential of gradient optimized fuzzy systems as interpretable, efficient, and adaptable alternatives to more complex deep learning models in supervised learning tasks.
Automating Data Annotation under Strategic Human Agents: Risks and Potential Solutions
As machine learning (ML) models are increasingly used in social domains to make consequential decisions about humans, they often have the power to reshape data distributions. Humans, as strategic agents, continuously adapt their behaviors in response to the learning system. As populations change dynamically, ML systems may need frequent updates to ensure high performance. However, acquiring high-quality human-annotated samples can be highly challenging and even infeasible in social domains. A common practice to address this issue is using the model itself to annotate unlabeled data samples. This paper investigates the long-term impacts when ML models are retrained with model-annotated samples when they incorporate human strategic responses. We first formalize the interactions between strategic agents and the model and then analyze how they evolve under such dynamic interactions. We find that agents are increasingly likely to receive positive decisions as the model gets retrained, whereas the proportion of agents with positive labels may decrease over time. We thus propose a refined retraining process to stabilize the dynamics. Last, we examine how algorithmic fairness can be affected by these retraining processes and find that enforcing common fairness constraints at every round may not benefit the disadvantaged group in the long run. Experiments on (semi-)synthetic and real data validate the theoretical findings.
A Optimizing Over the Returned Counterfactuals (x) follows the form of the objective in equation 1, G(x, A
In this appendix, we discuss a technique to optimize over the counterfactuals found by counterfactual explanation methods, such as [6]. We restate lemma 3.1 and provide a proof. This problem is identical to a well-studied class of bi-level optimization problems in deep learning. In these problems, we must compute the derivative of a function with respect to some parameter (here) that includes an inner argmin, which itself depends on the parameter. We follow [44] to complete the proof.
German Credit Data (Part 1): Exploratory Data Analysis
Data analytics is the process of analyzing raw data in order to draw information and make conclusions about the data. Data analytics is an important field in data science because it helps businesses optimize their performance. Data analytics helps businesses reduce costs and increase the overall efficiency of a business. When a bank receives a loan application the bank has to make a decision regarding whether to go ahead with the loan approval or not. The bank makes a decision on the loan based on the applicant's profile.
The Impact of Data Preparation on the Fairness of Software Systems
Valentim, Inรชs, Lourenรงo, Nuno, Antunes, Nuno
--Machine learning models are widely adopted in scenarios that directly affect people. The development of software systems based on these models raises societal and legal concerns, as their decisions may lead to the unfair treatment of individuals based on attributes like race or gender . Data preparation is key in any machine learning pipeline, but its effect on fairness is yet to be studied in detail. In this paper, we evaluate how the fairness and effectiveness of the learned models are affected by the removal of the sensitive attribute, the encoding of the categorical attributes, and instance selection methods (including cross-validators and random undersampling). We used the Adult Income and the German Credit Data datasets, which are widely studied and known to have fairness concerns. We applied each data preparation technique individually to analyse the difference in predictive performance and fairness, using statistical parity difference, disparate impact, and the normalised prejudice index. The results show that fairness is affected by transformations made to the training data, particularly in imbalanced datasets. Removing the sensitive attribute is insufficient to eliminate all the unfairness in the predictions, as expected, but it is key to achieve fairer models. Additionally, the standard random undersampling with respect to the true labels is sometimes more prejudicial than performing no random undersampling. Software systems based on machine learning (ML) are being used at an increasingly higher rate and on a multitude of scenarios that have a significant impact on people's lives. Their ubiquity raises several legal and societal concerns, as decisions based on the output of ML models may introduce or perpetuate historical bias against some individuals, based on their intrinsic characteristics, such as race, gender or age. The use of automated decision-making systems is often appealing due to the gains associated with it, and might even be perceived as a step towards the eradication of personal bias from the process. Nevertheless, many are the risks associated with a careless adoption of decisions supported by these systems. In this context, fairness emerges as a key property in terms of the reliability and trustworthiness of software systems based on ML. These receive nowadays increased attention from regulatory institutions, with the recently approved European Union General Data Protection Regulation (GDPR) demanding organisations to handle personal data in a privacy-preserving, fair and transparent manner [1].
Auditing Black-box Models for Indirect Influence
Adler, Philip, Falk, Casey, Friedler, Sorelle A., Rybeck, Gabriel, Scheidegger, Carlos, Smith, Brandon, Venkatasubramanian, Suresh
Data-trained predictive models see widespread use, but for the most part they are used as black boxes which output a prediction or score. It is therefore hard to acquire a deeper understanding of model behavior, and in particular how different features influence the model prediction. This is important when interpreting the behavior of complex models, or asserting that certain problematic attributes (like race or gender) are not unduly influencing decisions. In this paper, we present a technique for auditing black-box models, which lets us study the extent to which existing models take advantage of particular features in the dataset, without knowing how the models work. Our work focuses on the problem of indirect influence: how some features might indirectly influence outcomes via other, related features. As a result, we can find attribute influences even in cases where, upon further direct examination of the model, the attribute is not referred to by the model at all. Our approach does not require the black-box model to be retrained. This is important if (for example) the model is only accessible via an API, and contrasts our work with other methods that investigate feature influence like feature selection. We present experimental evidence for the effectiveness of our procedure using a variety of publicly available datasets and models. We also validate our procedure using techniques from interpretable learning and feature selection, as well as against other black-box auditing procedures.